optimizing binary size
2022-02-20 ยท 1 min read
Setup #
# for running cargo bloat
$ RUSTFLAGS="-C target-cpu=native" c install cargo-bloat
# for running cargo size (among other things)
$ rustup component add llvm-tools-preview
$ rustup +nightly component add llvm-tools-preview # (for strip=symbols)
$ RUSTFLAGS="-C target-cpu=native" c install cargo-binutils
TL;DR #
# compile and sort functions by binary size
$ cargo bloat --release -n 100
# only like 20-25% of the binary size seems to be our code or other relevant
# stuff like ndarray. The rest seems to be mostly panic and fmt
# infrastructure...
# compile and sort crates by binary size
$ cargo bloat --release --crates
# print the size of each section in the binary
$ cargo size --bin my-bin --release -- -A
# if you already have a built rust binary, you can run
# rust-size directly:
$ rust-size -A target/release/my-bin
# rust-size with nice sorted and human-readable output:
$ rust-size -A target/release/my-bin \
| tail -n +2 \
| sort --numeric-sort --key=2 \
| numfmt --header=3 --to=iec-i --suffix=B --field=2
# strip debug info and all symbols (requires nightly) then print section size
$ RUSTFLAGS="-Z strip=symbols -C target-cpu=native" cargo +nightly \
size --bin my-bin --release -- -A
# before: 1.8 MiB! looking at the sections, it's mostly debug info.
# "-Z strip=symbols" brings this down to like 330-400 KiB (depending on
# other flags etc...)
# TODO: cargo-binutils also installed `cargo strip`; maybe that's helpful?
Cargo.toml #
[profile.release]
codegen-units = 1
lto = true
panic = "abort"
# opt-level = "s" # optimize for size, but still unroll
# opt-level = "z" # optimize for size, no unrolling at all
opt-level = 3
debug = 0
Compile std with panic = "abort" #
- shaves off maybe 150 KiB?
- removes a decent chunk of the backtrace/unwind infrastructure
# .cargo/config.toml
[unstable]
build-std = ["std", "panic_abort"]
build-std-features = [] # <- turns off backtrace+unwind features
WASM #
https://rustwasm.github.io/twiggy/
Example: fixing bloat #
Let me just run a quick smoketest (which depends on almost every crate in the monorepo)...
$ cargo test -p smoketest
# ..
Finished test [unoptimized + debuginfo] target(s) in 1m 28s
Running unittests src/lib.rs (target/debug/deps/smoketest-bd637d7668a0b714)
# ..
Man that sure took a while to link, I wonder how big the binary is?
$ ls -lah target/debug/deps/smoketest-bd637d7668a0b714
-rwxrwxr-x 1 phlip9 phlip9 880M May 4 11:19 target/debug/deps/smoketest-bd637d7668a0b714
JESUS. RIP MY SSD.
$ rustup component add llvm-tools-preview
$ rust-size -A target/debug/deps/smoketest-bd637d7668a0b714 \
| tail -n +2 \
| sort --numeric-sort --key=2 \
| numfmt --header=3 --to=iec-i --suffix=B --field=2
section size addr
.fini_array 8 63669912
.fini 13 52345408
.init_array 16 63669896
.plt.got 24B 3608704
.init 27B 3608576
.interp 28B 848
.note.ABI-tag 32B 948
.note.gnu.property 32B 880
.debug_gdb_scripts 34B 55423896
.note.gnu.build-id 36B 912
.comment 43B 0
.gnu.hash 48B 984
.tdata 72B 63669824
.plt 96B 3608608
.rela.plt 120B 3605536
.gnu.version 318B 7088
.gnu.version_r 432B 7408
.dynamic 544B 64809064
.tbss 696B 63669896
.bss 2.1KiB 65535456
.dynstr 2.2KiB 4848
.dynsym 3.8KiB 1032
.debug_macro 12KiB 0
.data 24KiB 65511424
.got 686KiB 64809608
.data.rel.ro 1.1MiB 63669920
.gcc_except_table 1.4MiB 62213696
.eh_frame_hdr 1.5MiB 55423932
.debug_abbrev 2.3MiB 0
.rodata 3.0MiB 52346880
.rela.dyn 3.5MiB 7840
.debug_loc 3.5MiB 0
.debug_aranges 4.9MiB 0
.eh_frame 5.1MiB 56953128
.debug_ranges 15MiB 0
.debug_line 28MiB 0
.text 47MiB 3608768
.debug_pubnames 112MiB 0
.debug_str 175MiB 0
.debug_info 175MiB 0
.debug_pubtypes 275MiB 0
Total 851MiB
WTF IS GOING ON WITH THE .debug_pubtypes SECTION???
Ok ok, let's take a look at what we're working with...
$ sudo apt install dwarfdump
$ dwarfdump --print-type --format-suppress-offsets target/debug/deps/smoketest-bd637d7668a0b714 \
| head -n 10
.debug_pubtypes
'ErrorData<alloc::boxed::Box<std::io::error::Custom, alloc::alloc::Global>>'
'alloc::boxed::Box<std::io::error::Custom, alloc::alloc::Global>'
'alloc::boxed::Box<(dyn core::error::Error + core::marker::Send + core::marker::Sync), alloc::alloc::Global>'
'Result<(), std::io::error::Error>'
'NonNull<u8>'
'u8'
'SimpleMessage'
'ErrorKind'
How many types we got?
$ dwarfdump --print-type --format-suppress-offsets target/debug/deps/smoketest-bd637d7668a0b714 \
| wc -l
| numfmt --to=si
2.3M
Maybe there's some giga types?
$ dwarfdump --print-type --format-suppress-offsets target/debug/deps/smoketest-bd637d7668a0b714 \
| awk '{ print length, $0 }' \
| sort -n -r \
> smoketest_debug_pubtypes
$ head -n 10 smoketest_debug_pubtypes
72011 '{closure_env#0}<&str, &str, n ..
71962 '&mut (nom::sequence::terminat ..
71957 '(nom::sequence::terminated::{ ..
71957 '(nom::sequence::terminated::{ ..
66740 '&mut nom::branch::alt::{closu ..
66717 '{closure_env#0}<&str, &str, n ..
66717 '{closure_env#0}<&str, &str, n ..
66668 '&mut (nom::sequence::terminat ..
66663 '(nom::sequence::terminated::{ ..
66663 '(nom::sequence::terminated::{ ..
Ok despite nom taking to top 10, it looks like the primary culprit is my arch nemesis warp. CURSE YOU WARP AND YOUR COMPOSABLE GENERICS.
Let's see what proportion of our .debug_pubtypes is warp...
$ cat smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
271MiB
$ grep "nom" smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
3.3MiB
$ grep "warp" smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
82MiB
$ grep "lightning" smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
45MiB
$ grep "proptest" smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
5MiB
$ grep "Vec" smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
71MiB
$ grep "hyper" smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
95MiB
$ grep "tokio" smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
95MiB
A bunch of duplicates...
Checking out the .debug_str section:
$ dwarfdump --print-strings --format-suppress-offsets target/debug/deps/smoketest-bafe2762ec60a400 | sort --numeric-sort --key=3 | cut -b -80 | tail -n 10
name: length 67206 is 'pin<futures_util::future::try_future::into_future::IntoFu
name: length 67220 is 'get_unchecked_mut<futures_util::future::try_future::into_
name: length 67222 is 'leak<alloc::sync::ArcInner<warp::filter::boxed::BoxingFil
name: length 67229 is 'from<futures_util::future::try_future::into_future::IntoF
name: length 67233 is 'into_pin<futures_util::future::try_future::into_future::I
name: length 67240 is 'new<alloc::boxed::Box<alloc::sync::ArcInner<warp::filter:
name: length 67257 is 'new_unchecked<alloc::boxed::Box<futures_util::future::try
name: length 71736 is 'choice<&str, &str, nom::error::Error<&str>, nom::sequence
name: length 134431 is 'into<&mut alloc::sync::ArcInner<warp::filter::boxed::Box
name: length 134508 is 'into<alloc::boxed::Box<futures_util::future::try_future: